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Amendment to the Claims: 

Please cancel claims 9 and 19, without prejudice. 
Please amend the claims as follows: 

This listing of claims will replace all prior versions, and listing, of claims in the 

application: 

Listing of Claims: 

Claims 1 to 2 (canceled) 

Claim 3 (previously presented): A method for identifying a high confidence 
functional link between at least two proteins, comprising the following steps: 

(a) identifying non-homologous proteins as being functionally linked by a 
"Rosetta Stone" method comprising the following steps 

( i) providing amino acid sequences of a first protein and a second protein, 
wherein the first and second proteins are not homologous, 

(ii) providing an amino acid sequence of a third protein, 

(iii) aligning amino acid sequence segments from the first protein and the second 
protein to the amino acid sequence of the third protein, wherein the amino acid sequence 
segments from the first and the second protein do not align to each other with any significant 
sequence similarity, and 

(iv) establishing whether the first and second proteins are functionally linked by 
determining whether a significant sequence similarity is present between the aligned amino acid 
sequences of step (iii), thereby identifying non-homologous proteins as being functionally 
linked; 

(b ) identifying pairs of proteins in a genome as being functionally linked by a 
"phylogenetic profile" method comprising the following steps 

(i) providing a first plurality of protein sequences comprising substantially all 
protein sequences encoded by a first genome, 

(ii) providing a second plurality of protein sequences comprising substantially all 
protein sequences encoded by one or more additional genomes, 



Applicant : Marcotte et al. Attorney's Docket No.: 07419-021001 

Serial No. : 09/493,401 
Filed : January 28, 2000 
Page : 4 of 1 6 

(iii) comparing each protein sequence in the first plurality of protein sequences 
with substantially all the protein sequences of the second plurality of protein sequences to 
determine if a protein sequence in the first genome has a homolog in the one or more additional 
genomes based on the degree of similarity of the sequences being compared, 

(iv) generating a phylogenetic profile for each protein of the first genome, 
wherein the phylogenetic profile is a vector or pattern whose elements indicate whether a 
homolog of the corresponding protein is present or absent in the one or more additional genomes, 
and 

(v) grouping together proteins having similar phylogenetic profiles, wherein a 
similar phylogenetic profile indicates a functional link between the proteins; and 

(c) identifying pairs of proteins that are linked in both (a) and (b), thereby 
identifying a high confidence functional link between at least two proteins. 

Claim 4 (currently amended): The method of claim 3, further comprising: 

(a) generating an expression profile for each protein of the first genome where the 

expression profile is a vector or a pattern whose elements indicate the level of mRNA expression 

of the corresponding gene in two or more DNA chip experiments; and 

(bj grouping together genes having similar expression profiles as identified in (a). 

where a similar expression profile indicates a functional link between proteins. 

Claim 5 (previously presented): The method of claim 4, further comprising 
displaying the functional links as networks of related proteins, comprising: 

placing a plurality of proteins in a diagram such that functionally linked proteins 
are closer together than all other proteins; and 

identifying groups of proteins that fall in a cluster in said diagram as functionally 

related. 



Claim 6 (previously presented): The method of claim 5, wherein the placing of 
the plurality of proteins in a diagram utilizes a computer. 
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Claim 7 (previously presented): The method of claim 3, further comprising: 
identifying functional links for a plurality of protein pairs; 
placing substantially all protein pairs that are identified as functionally linked in a 
diagram such that functionally linked proteins are closer together than other proteins; and 

identifying groups of proteins that fall in a cluster in said diagram as functionally 

related. 



Claim 8 (previously presented): The method of claim 7, wherein the placing of 
substantially all protein pairs in a diagram utilizes a computer. 



Claim 9 (canceled) 



Claim 10 (currently amended): The method of claim 3 [[9]], wherein in the 
^Rosetta Stone" method establishing that the pair of non-homologous amino acid sequence 
segments of (i) have significant sequence similarities to different sequence segments of the 
protein of (ii) comprises showing that a computed probability (p) value is below a statistically 
significant threshold. 

Claim 1 1 (previously presented): The method of claim 10, wherein the 
probability threshold is set with respect to a value 1/N, wherein N is an integer based on the total 
number of protein sequences in a database. 

Claim 12 (currently amended): The method of claim 3 [[9]], wherein in the 
"Rosetta Stone" method the non-homologous amino acid sequence segments from different 
protein sequences of (i) are at least about 50 amino acid residues long. 

Claim 13 (currently amended): The method of claim 3 [[9]], wherein in the 
"Rosetta Stone" method the non-homologous amino acid sequence segments from different 
polypeptide sequences of (i) are between about 50 and about 1000 amino acid residues long. 
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Claim 14 (currently amended): The method of claim 3 [[9]], wherein in the 
"Rosetta Stone" method statistically insignificant Rosetta stone links are filtered out when either 
protein in (i) has a plurality of homologs. 

Claim 1 5 (currently amended): The method of claim 3 [[9]], wherein in the 
"Rosetta Stone" method the plurality of homologues is more than about 100 homologues. 

Claim 16 ( currently amended): The method of claim 3 [[9]], wherein in the 
"Rosetta Stone" method statistically insignificant Rosetta Stone links are filtered out when either 
protein in (i) forms a plurality of Rosetta Stone links to other distinct proteins. 

Claim 1 7 (previously presented ): The method of claim 16, wherein the plurality 
of Rosetta Stone links is more than about 100. 

Claim 18 (previously presented): The method of claim 17, wherein the plurality 
of Rosetta Stone links is more than about 25. 



Claim 20 (currently amended): The method of claim 3 [[19]], wherein the 
phylogenetic profile is generated using a bit type profiling method. 

Claim 21 (currently amended): The method of claim 3 [[19]], wherein the 
phylogenetic profile is generated using an evolutionary distance method. 

Claim 22 ( currently amended ): The method of claim 3 [[19]], wherein the 
phylogenetic profile is generated in a binary code describing the presence or absence of a given 
protein in an organism. 



Claim 19 (canceled) 
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Claim 23 (currently amended): The method of claim 3 [[19]], wherein the 
phylogenetic profile is generated in a continuous code that describes how similar the related 
sequences are in the different genomes. 

Claim 24 (currently amended): The method of claim 3 [[19]], wherein the 
phylogenetic profile is generated using an evolution probability process, wherein the process 
comprises 

(a) constructing a conditional probability matrix: p(aa — > aa'), where aa and aa' 
are any amino acids, and the conditional probability matrix is constructed by converting an 
amino acid substitution matrix from a log odds matrix to a conditional probability matrix; 

(b) accounting for an observed alignment of the constructed conditional 
probability matrix by taking the product and the conditional probabilities for each aligned pair of 
amino acids during the alignment of the two protein sequences, represented by 

n 

(c) determining an evolutionary distance a from powers equation: 
p'=p a (aa — ► aa'), maximizing for P. 

Claim 25 (previously presented): The method of claim 24, wherein the 
conditional probability matrix is defined by a Markov process with substitution rates over a fixed 
time interval. 

Claim 26 (previously presented): The method of claim 24, wherein the 
conversion from an amino acid substitution log odds matrix to a conditional probability matrix is 
represented by: 

Pp(i -> j) - P(j)2 A [BLOSUM62 ij / 2] , 

where BLOSUM62 is an amino acid substitution log odds matrix, and P(i— >j) is 
the probability that amino acid i is replaced by amino acid j through point mutations according to 
BLOSUM62 scores. 
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Claim 27 (previously presented): The method of claim 26, wherein Pj's are the 
abundances of amino acid j and are computed by solving a plurality of linear equations given by 
the normalization condition that ^Pa(i j) = 1 . 



